11 research outputs found

    Structured Dictionary Learning for Energy Disaggregation

    Full text link
    The increased awareness regarding the impact of energy consumption on the environment has led to an increased focus on reducing energy consumption. Feedback on the appliance level energy consumption can help in reducing the energy demands of the consumers. Energy disaggregation techniques are used to obtain the appliance level energy consumption from the aggregated energy consumption of a house. These techniques extract the energy consumption of an individual appliance as features and hence face the challenge of distinguishing two similar energy consuming devices. To address this challenge we develop methods that leverage the fact that some devices tend to operate concurrently at specific operation modes. The aggregated energy consumption patterns of a subgroup of devices allow us to identify the concurrent operating modes of devices in the subgroup. Thus, we design hierarchical methods to replace the task of overall energy disaggregation among the devices with a recursive disaggregation task involving device subgroups. Experiments on two real-world datasets show that our methods lead to improved performance as compared to baseline. One of our approaches, Greedy based Device Decomposition Method (GDDM) achieved up to 23.8%, 10% and 59.3% improvement in terms of micro-averaged f score, macro-averaged f score and Normalized Disaggregation Error (NDE), respectively.Comment: 10 Page

    Predicting Academic Performance: A Systematic Literature Review

    Get PDF
    The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

    Genome sequencing reveals Zika virus diversity and spread in the Americas

    Get PDF
    Although the recent Zika virus (ZIKV) epidemic in the Americas and its link to birth defects have attracted a great deal of attention, much remains unknown about ZIKV disease epidemiology and ZIKV evolution, in part owing to a lack of genomic data. Here we address this gap in knowledge by using multiple sequencing approaches to generate 110 ZIKV genomes from clinical and mosquito samples from 10 countries and territories, greatly expanding the observed viral genetic diversity from this outbreak. We analysed the timing and patterns of introductions into distinct geographic regions; our phylogenetic evidence suggests rapid expansion of the outbreak in Brazil and multiple introductions of outbreak strains into Puerto Rico, Honduras, Colombia, other Caribbean islands, and the continental United States. We find that ZIKV circulated undetected in multiple regions for many months before the first locally transmitted cases were confirmed, highlighting the importance of surveillance of viral infections. We identify mutations with possible functional implications for ZIKV biology and pathogenesis, as well as those that might be relevant to the effectiveness of diagnostic tests

    Probabilistic Methods for Data-Driven Social Good

    No full text
    Computational techniques have much to offer in addressing questions of societal significance. Many such question can be framed as prediction problems, and approached with data-driven methods. In addition to prediction, understanding human behavior is a distinguishing goal in societally-relevant domains. In this work, I describe societally-significant problems which can be solved with a collective probabilistic approach.These problems pose many challenges to techniques which assume data independence, homogeneity and scale. In settings of societal importance, dependencies can define the data in question; from complex relationships between people, to continuity between consecutive events. Rather than being generated by single, uniform sources, data in these domains can be derived and described by heterogenous sources. Finally, though many data-driven methods depend on large amounts of observations and high-quality labels in order to guarantee quality results, in domains of critical social value it is often infeasible to gather such quantities. These challenges demand methods which can utilize data-dependencies, incorporate diverse forms of information and reason over small numbers of instances with potentially ambiguous labels.There are also many opportunities in these domains. Models concerned with societally relevant problems can draw from the knowledge established by existing academic disciplines, from the social to the natural sciences. Such knowledge can serve to inform each step of research from choosing an appropriate problem to putting results into perspective. Furthermore, there are opportunities to obtain new insights into human behavior with the abundance of data generated by virtual and online activity, and mobile and sensor networks. The scale of this data necessitates computational methods. Methods which can leverage prior knowledge and remain efficient even with large datasets can offer much in these domains.In my work I utilize a collective probabilistic approach for data-driven social good. This approach can capitalize on structure between data instances, rather than flattening it. Furthermore, it can readily incorporate domain knowledge which, especially when combined with a collective approach, is instrumental in learning from small datasets. When datasets are large, this approach leverages a class of probabilistic graphical model which offers efficient inference. Finally, this approach can be extended to model unobserved phenomena with latent-variable representations.I demonstrate the benefits of this approach in three societally-relevant domains, sustainability, education and malicious behavior. While these domains are diverse, the problems they present share several commonalities which are critical in data-driven modeling. For example, modeling data structure, from spatial relationships to social interactions, can reduce issues of sparsity and noise. Domain knowledge can also combat these issues, in addition to improving model interpretability. I show the benefits of domain knowledge in discovering sustainable products, predicting course performance and detecting cyberbullying. In both the domain of sustainability and malicious behavior, I demonstrate how to utilize spatio-temporal structure in the seemingly distinct tasks of disaggregating appliances and predicting the movements of human traffickers. In education and malicious behavior, I show how unobserved social structure is instrumental in not only modeling learning and aggression, but in interpreting these dynamics in groups. In all three domains I show how to model, represent and interpret latent structure. Thus, while making contributions to each problem setting and domain, I also contribute to the broader goal of data-driven modeling for social good

    Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis

    Get PDF
    In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed
    corecore